11 research outputs found
Outcome-Driven Reinforcement Learning via Variational Inference
While reinforcement learning algorithms provide automated acquisition of
optimal policies, practical application of such methods requires a number of
design decisions, such as manually designing reward functions that not only
define the task, but also provide sufficient shaping to accomplish it. In this
paper, we view reinforcement learning as inferring policies that achieve
desired outcomes, rather than as a problem of maximizing rewards. To solve this
inference problem, we establish a novel variational inference formulation that
allows us to derive a well-shaped reward function which can be learned directly
from environment interactions. From the corresponding variational objective, we
also derive a new probabilistic Bellman backup operator and use it to develop
an off-policy algorithm to solve goal-directed tasks. We empirically
demonstrate that this method eliminates the need to hand-craft reward functions
for a suite of diverse manipulation and locomotion tasks and leads to effective
goal-directed behaviors.Comment: Published in Advances in Neural Information Processing Systems 34
(NeurIPS 2021
On Pathologies in KL-Regularized Reinforcement Learning from Expert Demonstrations
KL-regularized reinforcement learning from expert demonstrations has proved
successful in improving the sample efficiency of deep reinforcement learning
algorithms, allowing them to be applied to challenging physical real-world
tasks. However, we show that KL-regularized reinforcement learning with
behavioral reference policies derived from expert demonstrations can suffer
from pathological training dynamics that can lead to slow, unstable, and
suboptimal online learning. We show empirically that the pathology occurs for
commonly chosen behavioral policy classes and demonstrate its impact on sample
efficiency and online policy performance. Finally, we show that the pathology
can be remedied by non-parametric behavioral reference policies and that this
allows KL-regularized reinforcement learning to significantly outperform
state-of-the-art approaches on a variety of challenging locomotion and
dexterous hand manipulation tasks.Comment: Published in Advances in Neural Information Processing Systems 34
(NeurIPS 2021
Drug Discovery under Covariate Shift with Domain-Informed Prior Distributions over Functions
Accelerating the discovery of novel and more effective therapeutics is an
important pharmaceutical problem in which deep learning is playing an
increasingly significant role. However, real-world drug discovery tasks are
often characterized by a scarcity of labeled data and significant covariate
shift\unicode{x2013}\unicode{x2013}a setting that poses a challenge to
standard deep learning methods. In this paper, we present Q-SAVI, a
probabilistic model able to address these challenges by encoding explicit prior
knowledge of the data-generating process into a prior distribution over
functions, presenting researchers with a transparent and probabilistically
principled way to encode data-driven modeling preferences. Building on a novel,
gold-standard bioactivity dataset that facilitates a meaningful comparison of
models in an extrapolative regime, we explore different approaches to induce
data shift and construct a challenging evaluation setup. We then demonstrate
that using Q-SAVI to integrate contextualized prior knowledge of drug-like
chemical space into the modeling process affords substantial gains in
predictive accuracy and calibration, outperforming a broad range of
state-of-the-art self-supervised pre-training and domain adaptation techniques.Comment: Published in the Proceedings of the 40th International Conference on
Machine Learning (ICML 2023
The StarCraft Multi-Agent Challenge
In the last few years, deep multi-agent reinforcement learning (RL) has
become a highly active area of research. A particularly challenging class of
problems in this area is partially observable, cooperative, multi-agent
learning, in which teams of agents must learn to coordinate their behaviour
while conditioning only on their private observations. This is an attractive
research area since such problems are relevant to a large number of real-world
systems and are also more amenable to evaluation than general-sum problems.
Standardised environments such as the ALE and MuJoCo have allowed single-agent
RL to move beyond toy domains, such as grid worlds. However, there is no
comparable benchmark for cooperative multi-agent RL. As a result, most papers
in this field use one-off toy problems, making it difficult to measure real
progress. In this paper, we propose the StarCraft Multi-Agent Challenge (SMAC)
as a benchmark problem to fill this gap. SMAC is based on the popular real-time
strategy game StarCraft II and focuses on micromanagement challenges where each
unit is controlled by an independent agent that must act based on local
observations. We offer a diverse set of challenge maps and recommendations for
best practices in benchmarking and evaluations. We also open-source a deep
multi-agent RL learning framework including state-of-the-art algorithms. We
believe that SMAC can provide a standard benchmark environment for years to
come. Videos of our best agents for several SMAC scenarios are available at:
https://youtu.be/VZ7zmQ_obZ0
Plex: Towards Reliability using Pretrained Large Model Extensions
A recent trend in artificial intelligence is the use of pretrained models for
language and vision tasks, which have achieved extraordinary performance but
also puzzling failures. Probing these models' abilities in diverse ways is
therefore critical to the field. In this paper, we explore the reliability of
models, where we define a reliable model as one that not only achieves strong
predictive performance but also performs well consistently over many
decision-making tasks involving uncertainty (e.g., selective prediction, open
set recognition), robust generalization (e.g., accuracy and proper scoring
rules such as log-likelihood on in- and out-of-distribution datasets), and
adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of
tasks over 40 datasets in order to evaluate different aspects of reliability on
both vision and language domains. To improve reliability, we developed ViT-Plex
and T5-Plex, pretrained large model extensions for vision and language
modalities, respectively. Plex greatly improves the state-of-the-art across
reliability tasks, and simplifies the traditional protocol as it improves the
out-of-the-box performance and does not require designing scores or tuning the
model for each task. We demonstrate scaling effects over model sizes up to 1B
parameters and pretraining dataset sizes up to 4B examples. We also demonstrate
Plex's capabilities on challenging tasks including zero-shot open set
recognition, active learning, and uncertainty in conversational language
understanding.Comment: Code available at https://goo.gle/plex-cod
On Sequential Bayesian Inference for Continual Learning
Sequential Bayesian inference can be used for continual learning to prevent catastrophic forgetting of past tasks and provide an informative prior when learning new tasks. We revisit sequential Bayesian inference and assess whether using the previous task’s posterior as a prior for a new task can prevent catastrophic forgetting in Bayesian neural networks. Our first contribution is to perform sequential Bayesian inference using Hamiltonian Monte Carlo. We propagate the posterior as a prior for new tasks by approximating the posterior via fitting a density estimator on Hamiltonian Monte Carlo samples. We find that this approach fails to prevent catastrophic forgetting, demonstrating the difficulty in performing sequential Bayesian inference in neural networks. From there, we study simple analytical examples of sequential Bayesian inference and CL and highlight the issue of model misspecification, which can lead to sub-optimal continual learning performance despite exact inference. Furthermore, we discuss how task data imbalances can cause forgetting. From these limitations, we argue that we need probabilistic models of the continual learning generative process rather than relying on sequential Bayesian inference over Bayesian neural network weights. Our final contribution is to propose a simple baseline called Prototypical Bayesian Continual Learning, which is competitive with the best performing Bayesian continual learning methods on class incremental continual learning computer vision benchmarks